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ABSTRACT 

We consider the problem of embedding unweighted, directed 
k-nearest neighbor graphs in low-dimensional Euclidean 
space. The k-nearest neighbors of each vertex provide or¬ 
dinal information on the distances between points, but not 
the distances themselves. Relying only on such ordinal in¬ 
formation, along with the low-dimensionality, we recover the 
coordinates of the points up to arbitrary similarity transfor¬ 
mations (rigid transformations and scaling). Furthermore, 
we also illustrate the possibility of robustly recovering the 
underlying density via the Total Variation Maximum Penal¬ 
ized Likelihood Estimation (TV-MPLE) method. We make 
existing approaches scalable by using an instance of a local- 
to-global algorithm based on group synchronization, recently 
proposed in the literature in the context of sensor network 
localization, and structural biology, which we augment with a 
scale synchronization step. We show our approach compares 
favorably to the recently proposed Local Ordinal Embedding 
(LOE) algorithm even in the case of smaller sized problems, 
and also demonstrate its scalability on large graphs. The 
above divide-and-conquer paradigm can be of independent 
interest to the machine learning community when tackling 
geometric embeddings problems. 

Index Terms — k-nearest-neighbor graphs, ordinal con¬ 
straints, graph embeddings, eigenvector synchronization 

1. INTRODUCTION 

Embedding unweighted /c-nearest neighbor (kNN) graphs is 
a special case of ordinal or non-metric embedding, where one 
seeks a spatial embedding of n points in such that 

V(il, jl,'j2,i2) e C, ll^ii -%l|2, (1) 

where C denotes the set of ordinal constraints. Ordinal con¬ 
straints are sometimes also specified as triplets m. In the 
case of unweighted kNN graph embedding, C = C{G) = 
{(a, 6, a, c)\ab G E{G)^ac ^ E{G)^ , where E{G) is the set 
of directed edges in the kNN graph G. 

Graph-based methods are of utmost importance in several 
modem machine learning methods with applications such as 
clustering, dimensionality reduction, and ranking. Many such 
methods rely on weighted graphs, with weights often based 


on similarity functions, i.e., Wij = S{xi, xj). From a practi¬ 
cal standpoint, storing unweighted kNN graphs instead would 
allow for a very sparse representation of the data. If one 
could recover the source data {xi}^^^ from unweighted kNN 
graphs, such a computationally efficient sparser representa¬ 
tion would incur no information loss. Because of the extreme 
sparsity of the representation, this is generally a hard problem. 
Just recently, a method for recovering data distributions from 
unweighted kNN graphs was introduced in ||2l • Another mo¬ 
tivation for this problem comes from an instance of the pop¬ 
ular sensor network localization problem, where each sensor 
is able to transmit only limited connectivity information to 
a central location (ID names of its k nearest neighbors), but 
transmits neither the distance measurements nor a complete 
list of all its neighbors within a given fixed radius. 

The original work on this problem dates back to Shepard 
0 and Kruskal mm, and lately has been studied intensively 
in the machine learning literature IB 0 Cl i m m dn El 
El El El dS El- In this work, we compare against and 
extend the recent Local Ordinal Embedding method CD, 
which enjoys several favorable comparisons with other mod¬ 
ern methods. Another motivation for this problem comes 
from an instance of the popular sensor network localization 
problem, where each sensor is able to transmit only limited 
connectivity information to a central location, in the form 
of ID names of its k nearest neighbor sensors, but transmits 
neither the estimated distance measurements nor a complete 
list of all its neighbors within a given fixed radius. Note 
that either of these last two scenarios renders the localiza¬ 
tion problem (of estimating the sensor coordinates) easier 
to solve. Similar to the sensor network application, one 
could potentially apply this framework to cooperative con¬ 
trol and sensing involving swarms of robot micro-vehicles 
with limited payloads communicating via radio with limited 
bandwidth 1191 EOl- Our key ingredient is a modified ver¬ 
sion of the As-Synchronized-As-Possible (ASAP) algorithm 
introduced in El, which makes existing embedding meth¬ 
ods scalable via a divide-and-conquer, non-iterative local to 
global approach, reduces computational complexity, allows 
for massive parallelization of large problems, and increases 
robustness to noise. The ASAP algorithm introduced in ISTl . 
on which we rely in the present paper, renders our approach to 


reconstruct kNN graphs scalable to graphs with thousands or 
even tens of thousands of nodes, and is an example of a local- 
to-global approach that integrates local ordinal information 
into a global embedding calculation. 

We detail in Section O the exact approach used to de¬ 
compose the initial kNN graph into many overlapping sub¬ 
graphs, that we shall refer to as patches from now on. Each 
resulting patch is then separately embedded in a coordinate 
system of its own using an ordinal embedding algorithm, such 
as the recent Local Ordinal Embedding (LOE) algorithm ifT^ . 
In the hypothetical scenario when LOE recovers the actual 
ground truth coordinates of each patch, such local coordinates 
agree with the global coordinates up to scaling and some un¬ 
known rigid motion (such as rotation, reflection and transla¬ 
tion), in other words, up to a similarity transformation. How¬ 
ever, in most practical instances, it is unreasonable to expect 
that the LOE algorithm will recover the exact coordinates 
only from ordinal data. On a related note, we point out the re¬ 
cent work of Kleindessner and von Luxburg ll22l . who settled 
a long-known conjecture claiming that, given knowledge of 
all ordinal constraints of the form | 11 < \ \xk—xi\ \ be¬ 

tween an unknown set of points ,..., G M (for finite n), 
it is possible to approximately recover the ground truth coor¬ 
dinates of the points up to similarity transformations. Eurther- 
more, the same authors show that the above statement holds 
even when we only have local information such as the dis¬ 
tance comparisons between points in small neighborhoods of 
the graphs, thus giving hope for a local-to-global approach, in 
the spirit of the one we propose in the present paper. 


Our contributions are: 1. We present a local-to-global ap¬ 
proach for the problem of embedding clouds of points from 
ordinal information, which is scalable to very large graphs, 
and can be computed efficiently and robustly in a distributed 
manner. Specifically, we extend the ASAP framework to the 
setting of ordinal embeddings, by augmenting it with a scale 
synchronization step. We believe that local-to-global strate¬ 
gies could benefit many problems in the machine learning 
community. The scale of data involved in many interesting 
problems poses a challenge to direct, holistic approaches. 2. 
We extend the ordinal embedding pipeline to perform den¬ 
sity estimation via Total Varation Maximum Penalized Like¬ 
lihood Estimation. This demonstrates the similarity between 
the point localization and density estimation problems. Suf¬ 
ficiently simple point distributions can be well estimated by 
applying a short postprocessing step to an approximate em¬ 
bedding. 3. We present preliminary results for a very simple, 
straightforward ordinal embedding method. 


The rest of the paper is organized as follows. Section]^ is 
a summary of existing methods for related embedding prob¬ 
lems. Section [^details the pipeline of the ASAP framework, 
including the scale synchronization step in Section ^ In 
Section]^ we remark on the connection to the density esti¬ 
mation problem, and describe the post-processing step per¬ 
formed via Total-Variation Maximum Penalized Likelihood 


Estimation. Section shows the results of several experi¬ 
ments recovering point embeddings from a variety of data 
sets, and compares to the existing LOE algorithm, as well 
as presenting results for the density estimation problem. In 
Section we discuss an entirely different approach to ordi¬ 
nal embedding, and present some preliminary results which 
suggest more modifications are needed. We conclude our pri¬ 
mary discussion in Section|^and summarize in Appendix |7.1| 
some related basic notions from the rigidity theory literature. 

2. RELATED WORK 
2.1. Multidimensional Scaling 

Broadly speaking, multidimensional scaling (MDS) refers 
to a number of related problems and methods. In Classical 
Multidimensional Scaling (CMDS) (231, one is given all Eu¬ 
clidean Squared-Distance measurements Aij = \\xi — XjWl 
on a set of points X = {xi}^^^ and wishes to approximate 
the points, assuming that they approximately lie in a low¬ 
dimensional space d n. Note that the solution for the 
coordinates is unique only up to rigid transformations, and 
that solutions do not exist for all possible inputs A. 

One can generalize CMDS to incorporate additional non¬ 
negative weights Wij on each distance, useful when some 
distances are missing, or most distances are noisy, but some 
are known. The optimization involves minimizing an energy 
known in the literature as stress EH. One approach to min¬ 
imize stress is to iteratively minimize a majorizing function 
of two variables. A further generalization of MDS is non¬ 
metric MDS, or Ordinal Embedding, in which the input D 
is assumed to be an increasing function applied to distance 
measurements 0. This may be the case if D represents dis¬ 
similarity between points, as opposed to measured distances. 
The problem can again be expressed with stress functionals 
and is usually solved with isotonic regresion o. 


2.2. Semidefinite Programming methods 

Semideflnite Programming methods (SDP) have been applied 
frequently to MDS and related problems. Classical MDS can 
be stated as an SDP, with a closed form solution. Any for¬ 
mulation of the problem that optimizes over the Gram matrix 
requires the semideflnite constraint K Indeed, for met¬ 

ric MDS, if one penalizes the squared error on the squared 
distance measurements, the problem can be written as 

min y^WijiAij - Aij{X)f 
= min ^ Wij{Aij - {Ku - 2Kij + Kjj)f 

+ IJ 

s.t. K = X'^X. 



Constraints of the form K = X^X are usually not allowed 
however, and are typically relaxed to 1^1^ 


I 

X^ 


X 

K 


h 0 . 


via Schur’s Lemma. Furthermore, one encourages K to be ap¬ 
proximately low-rank by introducing a nuclear norm or trace 
penalty \\K\\^ = ||cr(Ff)||i = tr(Ff), as a convex relaxation 
of a rank constraint. Intuitively, since the ii norm promotes 
sparsity, the nuclear norm should promote few nonzero singu¬ 
lar values. Elsewhere da, it is argued that one should max¬ 
imize tr{K), in the spirit of the popular Maximum Variance 
Unfolding approach IZ7l . Neither minimizing nor maximiz¬ 
ing the trace actually imposes an exact rank constraint, which 
is non-convex and NP-hard. One approach that could achieve 
exact rank constraints would be to use the Majorized Penalty 
Approach of Gao and Sun 12811 with an alternating minimiza¬ 
tion method. 

A group of methods have studied the graph realization 
problem, where one is asked to recover the configuration of a 
cloud of points given a sparse and noisy set of pairwise dis¬ 
tances between the points |[29l [30l [STJ [32j |33l. One of the 
proposed approaches involves minimizing the following en¬ 
ergy 


min 


E 

{iJ)eE 


{\\pi-pjf 



( 2 ) 


which unfortunately is nonconvex, but admits a convex relax¬ 
ation into a SDP program. We refer the reader to Section 2 
of EB for several variations of this approach, some of which 
have been shown to be more robust to noise in the measured 
distances. 


2.3. Local Ordinal Embedding 

Terada and von Luxburg csi have recently proposed an al¬ 
gorithm for ordinal embedding and kNN embedding specif¬ 
ically, called Local Ordinal Embedding (LOE), which mini¬ 
mizes a soft objective function that penalizes violated ordinal 
constraints. 


mm > ma 

X^RdXn 

i<j,k<l,{i,j,k,l)eC 


:[0,Dij{X) + 6-Dki{X)]\ (3) 


The energy takes into account not only the number of con¬ 
straints violated, but the distance by which the constraints are 
violated, penalizing large violations more heavily. 

An advantage of this energy in contrast to ones that nor¬ 
malize by the variance of X (to guarantee nondegeneracy) is 
its relatively simple dependence on X, making the above en¬ 
ergy easier to minimize. Instead, the scale parameter S guar¬ 
antees nondegeneracy, and fixes the scale of the embedding 
(which is indeterminable from ordinal data alone). 

The authors introduce algorithms to minimizing the above 
energy, based on majorization minimization and the Broyden- 
Fletcher-Goldfarb-Shanno (BFGS) approximation of New¬ 
ton’s method, and prove that ordinal embedding is possible 


when only local information is given (e.g. a k neareast neigh¬ 
bor graph). The algorithm recovers not only the ordinal 
constraints, but the density structure of the data as well. The 
algorithm applies to ordinal constraints associated with /cNN 
graphs as well more general sets of ordinal constraints. We 
will use this crucial property when solving subproblems in 
the method presented here, as the corresponding subgraphs 
are generally not kNN graphs. 

3. ASAP & SCALE SYNCHRONIZATION FOR 
ORDINAL EMBEDDINGS 

In this section we detail the steps of the ASAP algorithm, 
central to the divide-and-conquer algorithm we propose for 
the ordinal embedding problem. Note that the difference be¬ 
tween the original ASAP algorithm introduced in II 2 TII and 
our approach lies in the decomposition method from Section 
|3.1| and the scale synchronization step from Section [3^ The 
ASAP approach starts by decomposing the given graph G into 
overlapping subgraphs (referred to as patches), which are then 
embedded via the method of choice (in our case LOE). To ev¬ 
ery local patch embedding, there corresponds a scaling and 
an element of the Euclidean group Euc(d) of d-dimensional 
rigid transformations, and our goal is to estimate the scalings 
and group elements that will properly align all the patches in a 
globally consistent framework. The local optimal alignments 
between pairs of overlapping patches yield noisy measure¬ 
ments for the ratios of the above unknown group elements. 
Finding group elements from noisy measurements of their ra¬ 
tios is also known as the group synchronization problem, for 
which Singer Ea introduced spectral and semidefinite pro¬ 
gramming (SDP) relaxations over the group SO(2) of planar 
rotations, which is a building block for the ASAP algorithm 

ED. 

Table gives an overview of the steps of our approach. 
The inputs are an ordinal graph (we consider kNN graphs) 
G = (V^E), where edge ij G E and non-edge il ^ in¬ 
dicates that dij < dii^ the max patch size parameter MPS, 
the target dimension d, and a base-case ordinal embedding 
method OrdEmhed \ G ^ X G for embedding each 

patch, such as LOE. 

3.1. Break up the kNN graph into patches and embed 

The first step we use in breaking the kNN graph into patches is 
to apply normalized spectral clustering oa to a symmetrized 
version of the graph. Normalized spectral clustering parti¬ 
tions the nodes of a graph into N n clusters by perform¬ 
ing k-means on the embedding given by the top N eigenvec¬ 
tors of the random-walk normalized graph Laplacian. It is 
shown that normalized spectral clustering minimizes a 
relaxation of the normalized graph cut problem. Next, we 
enlarge the clusters by adding the graph-neighbors of each 
node, so that the resulting patches have significant overlap, 
a prerequisite for the ASAP synchronization algorithm. The 
higher the overlap between the patches, the more robust the 
pairwise group ratio estimates would be, thus leading overall 




Algorithm 1 Modified ASAP algorithm that incorporates the scale synchronization step. 


INPUT 

G = (V,E), |y| = n, \E\ = m, MPS, d, OrdEmbed{-) 

Choose Patches 
Embed Patches 

1. Break G into N overlapping globally rigid patches Pi,..., Pn following the steps in Sec. 3.1 

2. Embed each patch Pi separately based on the ordinal constraints of the corresponding subgraph of G using 
OrdEmbed{-). 

Step 1 

Scale 

1. If a pair of patches {Pi, Pj) have enough nodes in common, let Kij be the median of the ratios of distances 
realized in the embedding of Pi and their corresponding distances in Pj as in (|^; otherwise set Kij = 0. 

2. Compute the eigenvector Vi corresponding to the largest eigenvalue of the sparse matrix A. 

3. Scale each patch Pihy Vi{i), for i — 1,... ,n 

Step 2 

Rotate & Reflect 

1. Align all pairs of patches {Pi, Pj) that have enough nodes in common. 

2. Estimate their relative rotation and possibly reflection Hij G 0{d) C 

3. Build a sparse dN x dN symmetric matrix H = {Hij) where entry ij is itself a matrix in 0{d). 

4. Define H = D~^H, where P> is a diagonal matrix with 

Di+d{i-i),i+d{i-i) = ... = Ddi^di = deg{i), z = 1,..., A, where deg{i) is the node degree of patch Pi. 

5. Compute the top d eigenvectors of H satisfying Hvl^ — X^vl^ ,i — 1,... ,d. 

6. Estimate the global reflection and rotation of patch Pi by the orthogonal matrix hi that is closest to Hi in 
Erobenius norm, where Hi is the submatrix corresponding to the i* patch in the dN x d matrix formed by 
the top d eigenvectors [vl^.. .v^]. 

7. Update the embedding of patch Pi by applying the above orthogonal transformation hi 

Step 3 Translate 

Solve m X n overdetermined system of linear equations ([6|) for optimal translation in each dimension. 

OUTPUT 

Estimated coordinates xi,... ,Xn 


to a more accurate final global solution. Finally, we use an it¬ 
erative procedure to remove nodes from the graph relying on 
tools from rigidity theory. If a patch is not globally rigid, 
we drop a constant fraction of the added nodes. At each round 
we choose to drop a quarter of the nodes with the lowest de¬ 
gree while retaining all nodes that were in the original cluster 
generated by k-means in the corresponding patch. This uses 
the heuristic that low-degree nodes tend to render a graph not 
globally rigid. After dropping nodes, we check the remaining 
patch for globally rigidity again. We stop the pruning process 
when the patch contains fewer than 4/3 the number of nodes 
in the original cluster, or the patch is globally rigid. 

We refer the readers to Appendix |7.1| for for a brief de¬ 
scription of global rigidity, and relevant results in the litera¬ 
ture, and use the remainder of this section as a brief discus¬ 
sion of the main definitions. In the graph realization prob¬ 
lem (GRP), one is given a graph G = {V^E) together with a 
non-negative distance measurement dij associated with each 
edge, and is asked to compute a realization of G in In 
other words, for any pair of adjacent nodes i and j, the dis¬ 
tance dij = dji is available, and the goal is to find a d- 
dimensional embedding ^ such that \\pi — 

PjW = dij, forall(i,j) G E. The main difference between 
the GRP and the problem we aim to address in our paper is 
the input information available to the user. Unlike the GRP 
problem where distances are available to the user, here we 
only have information of the adjacency matrix of the graph 
and have the knowledge that it represents a kNN graph. Both 
problems aim to recover an embedding of the initial configu- 

graph is globally rigid if all realizations of it are congruent up to a 
rigid transformation. 


ration of points. 

A graph is globally rigid in if there is a unique (up 
to the trivial Euclidean isometries) embedding of the graph 

such that all distance constraints are preserved. It is well 
known that a necessary condition for global rigidity is 3- 
connectivity of the graph. Since the problem at hand that we 
are trying to solve is harder (as we do not have distance infor¬ 
mation available) we require that the patches we generate are 
globally rigid graphs. Even in the favorable scenario when 
we do have available distance measurements (which we do 
not in the present problem, but only ordinal information), any 
algorithm seeking an embedding of the graph would fail if 
the graph were to have multiple non-congruent realizations. 

3.2. Scale Synchronization 

Before applying the original ASAP algorithm to the embed¬ 
ded patches, we introduce an additional step that further im¬ 
proves our approach and is independent of the dimension d. 
In the graph realization problem which motivated ASAP, one 
is given a graph G = {V,E) and non-negative distance mea¬ 
surement dij associated with each edge ij G E{G), and is 
asked to compute a realization of G in The distances are 
readily available to the user and thus the local embedding of 
each patch is on the same scale as the ground truth. How¬ 
ever, in the kNN embedding problem, distances are unknown 
and the scale of one patch relative to another must be approx¬ 
imated. Any ordinal embedding approach has no way of re¬ 
lating the scaling of the local patch to the global scale. To this 
end, we augment the ASAP algorithm with a step where we 
synchronize local scaling information to recover an estimate 
for the global scaling of each patch, thus overall synchroniz- 

















ing over the group of similarity transformations. 

We accomplish this as follows. Given a set of patches, 
{Pi}iLi^ we create a patch graph in which two patches are 
connected if and only if they have at least d + 1 nodes in 
common. We then construct a matrix A G as 


Aij — < 


median 

0 


1^1 

IVJa 


^bePiCiPj 


if Pi ~ Pj, i < j, 
if Pi ~ Pj, i > j, 
Otherwise, 


(4) 

where is the distance between nodes a and b as realized 
in the embedded patch Pi. The matrix A approximates the 
relative scales between patches. If all distances in all patches 
were recovered correctly up to scale, and all patches had suf¬ 
ficient overlap with each other, then each row of A would be a 
scalar multiple of the others and each column of A would be 
scalar multiple of the others, thus rendering A a rank-1 ma¬ 
trix. For the noisy case, in order to get a consistent estimate 
of global scaling, we compute the best rank-1 approximation 
of A, given by its leading eigenvector . We use this ap¬ 
proximation of global scaling to rescale the embedded patches 
before running ASAP. Note that the connectivity of the patch 
graph and the non-negativity of A guarantee, via the Perron- 
Frobenius Theorem, that all entries of are positive. We 
refer the reader to Figure which illustrates on an actual ex¬ 
ample the importance of this scaling synchronization step. 


3.3. Optimal Rotation, Reflection and Translation 

After applying the optimal scaling to each patch embedding, 
we use the original ASAP algorithm to integrate all patches 
in a global framework, as illustrated in the pipeline in Fig¬ 
ure!^ We estimate, for each patch Pi, an element of the Eu¬ 
clidean group Euc(d) = 0(d) which, when applied to 
that patch embedding Pi, aligns all patches as best as pos¬ 
sible in a single coordinate system. In doing so, we start 
by estimating, for each pair of overlapping patches Pi and 
Pj, their optimal relative rotation and reflection, i.e., an el¬ 
ement Hij of the orthogonal group 0(d) that best aligns Pj 
with Pi. Whenever the patch embeddings perfectly match the 
ground truth, P[ij = OiOj^. We refer the reader to 1^ for 
several methods on aligning pairs of patches and computing 
their relative reflections and rotations Hij. Finding group el¬ 
ements {Oi}fLi from noisy measurements Hij of their ratios 
is also known as the group synchronization problem. Since 
this problem is NP-hard, we rely on the spectral relaxation 
El of 


Oi 


min 

■ •>On ^0{d) 


E 110*07' 

Pi-Pj 




(5) 


for synchronization over 0(2), and estimate a consistent 
global rotation of each patch from the top d eigenvectors of 
the graph Connection Laplacian, as in Step 2.4 in Table 
We estimate the optimal translation of each patch by solving, 
in a least squares sense, d overdetermined linear systems 

Xi - Xj = xf'' - xf\ {i,j)eEk, k = l,...,N, (6) 



Recovered patch 

Fig. 1. ASAP and scale synchronization pipeline. 

(k) 

where Xi, respectively x) \ denote the unknown location of 
node i we are solving for, respectively, the known location of 
node i in the embedding of patch P ^. We refer the reader to 
ED for a description of computing the optimal translations. 


3.4. Extension to higher dimensions 

Although we present experiments here on 2D and 3D data, 
the ASAP approach extends naturally to higher dimensions. 
In the 3D case, ASAP has been recently used as a scalable 
robust approach to the molecule problem in structural biol¬ 
ogy ll36l . For the d-dimensional general case, one can extend 
ASAP by first using the same approach for scaling synchro¬ 
nization from Section 3.2 then synchronizing over 0(d), and 
finally estimating the optimal translations over by solv¬ 
ing d overdetermined systems of linear equations via least- 
squares. The LOE approach that can be used to obtain the 
local patch embeddings required by ASAP, has a natural ex¬ 
tension to the d-dimensional case, thus rendering the entire 
pipeline amenable to dealing with higher-dimensional data. 


4. DENSITY ESTIMATION 

In this section, we remark on the explicit connection between 
the graph embedding problem considered in this paper and 
the density estimation problem. In particular, one may ap¬ 
proach the problem of recovering the unknown coordinates 
underlying the kNN graph by first aiming to estimate the den¬ 
sity function that generates the coordinates. Suppose for ex¬ 
ample that one is able to estimate the pointwise density u : 

C —> [0,1], up to some constant multiple, evaluated 

at each vertex of the graph, Xi. Next, as outlined in El, one 
can assign weights to the originally unweighted kNN graph, 
defined by u;(xi,Xj) = (u~^^^(xi)Pu~^^^(xj))/2. Further¬ 
more, it can be shown that the shortest path distance in the 
resulting weighted kNN graphs converges to the Euclidean 
distance of the original points as the number of points in¬ 
creases. In other words, applying multidimensional scaling 
to the shortest path distances on the weighted kNN graph will 
yield increasingly accurate embeddings of the original points 
as n ^ -hoo. 











In contrast to finding an approximate embedding from a 
density estimate, under certain conditions, the reverse process 
is also straightforward. With sufficiently many points and suf¬ 
ficiently strong priors on the distribution, the methodology 
of Maximum Penalized Likelihood Estimation (MPLE) ap¬ 
plies Ell. One first assumes that the locations correspond to 
points drawn independently identically distributed according 
to some unknown underlying spatial distribution. MPLE ap¬ 
proximates the most likely spatial distribution given the points 
observed and some assumed prior distribution on the space 
of distributions. The data fidelity term comes in the form of 
a log-likelihood term, a function of the distribution estimate 
and the point locations, and is given by 

n 

L{u, = ^log(«(a;i)), 

i=l 

and the penalty term, P{u) enforces the prior distribution 
on the space of distributions. Typical choices for P{u) in¬ 
clude the iif^-seminorm regularizer, P{u) = ^ Jq \Vu\^dx, 
enforcing smoothness, and Total Variation (TV) norm regu¬ 
larization, P{u) = A \ Vu\dx, which enforces smoothness, 
but also allows for edges. Therefore, general MPLE seeks 
to optimize the following energy over all probability distribu¬ 
tions on the spatial domain Q 

u = argmax „>0 udx=iL{u, {xi}'^=i) - Pi.u). 

The form and scale of P encodes different types and amounts 
of regularity in the resulting density estimate u. In practical 
settings, cross-validation should be performed to determine 
the appropriate amount of regularity to impose on a given data 
set. 

Eor the purpose of using kNN graphs to recover densi¬ 
ties, we will include a post-processing step for a subset of the 
embedding experiments, to which we apply a standard imple¬ 
mentation of TV MPLE |[38l to the embedded points. TV is 
a good choice of penalty because we will be applying it to 
points that are drawn from a piecewise constant density. The 
good density estimates based on good embeddings shown in 
Section illustrate that there is in fact a strong connection 
between the embedding and density estimation problems. 

The actual implementation of the TV MPLE relies on the 
Split Bregman (equivalently Alternating Direction Method 
of Multipliers) minimization technique in which one intro¬ 
duces a splitting and equality constraints that are enforced 
by performing saddle-point optimization of the augmented 
Lagrangian. This results in an iterative update procedure 
given by Algorithmic The first minimization step is actually 
replaced by minimizing over u, and d individually, making 
use of the shrinkage proximal operator associated with the 
norm. 

5. EXPERIMENTS 

Our experiments compare embeddings of points drawn from 
three different 2D synthetic densities: piecewise constant 


Algorithm 2 TV MPLE 
INPUT: 

^ = 0 ,^ = 0 

Eor number Iterations { 

(u,Ij = 

^ i = l 

P d^ y\\\ P ^ -Ip zf^ 

y =y P Vfi — d 
z=zP ||fi||i - 1 

} 


half-planes (PC), piecewise constant squares (PCS), and 
Gaussian (Gauss), and a 3D synthetic denisty : piecewise 
constant half-cubes (halfcube), each with n = {500,1000, 5000} 
points, as well as points drawn uniformly from a 3D donut 
shape (Donut) with n = 500, and the actual 2D coordinates 
of n = 1101 cities in the US (US cities). Eor a given set of 
data points, we use its kNN adjacency matrix as input to each 
ordinal embedding method. Separate from these datasets with 
a clear correct geometric embeddings, we find embeddings 
of points in a co-authorship network of network scientists 
(NetSci2010) with n = 552 (see Section [5^ . We test Lapla- 
cian Eigenmaps 1^ . the LOE BEGS and LOE MM methods 
[TSl, and ASAP with LOE BEGS used for the patch embed¬ 
dings. As LOE was already compared with several methods 
in ifTsl . attaining better performance than LOE may suggest 
better performance than a number of relevant methods includ¬ 
ing Kamada and Kawai ll4Ql . and Eruchterman and Reingold 
ED- We remark that our approach deals with a different 
input than that of the t-SNE method in 14^ . which is gen¬ 
erally used for embeddings of high dimensional data where 
some of the constraints are deliberately violated, which is not 
necessarily the case in our setting. We evaluate the methods 
based on (wall-clock) runtime and two different error metrics, 
Procrustes alignment error||43l, and A-error (Sa) defined as 
the percentage of edge disagreements between the kNN adja¬ 
cency matrix of the proposed embeddeding X and the ground 
truth 


error(X,X) : ^ |(4)_. - 


n^ 


(J) 


where A\ G {0,1}’^^’^ denotes the adjacency matrix of the 
corresponding kNN graph. We set varying limits on the num¬ 
ber of LOE iterations {5,10, 50,100,300, 500}, and we use 
varying maximum patch sizes (MPS) for ASAP . The LOE 
and ASAP methods give, for each distribution and values n 






and k, an error-runtime Pareto curve (with low values in both 
coordinates being best). In Tablewe establish some short¬ 
hand notation for the methods and parameters used in this 
section. For fair comparisons, we pass the same randomly 
sampled data to each of the methods. Ideally, one would run 
these experiments many times over and average the results 
(to get an estimate of average performance), but this is ef¬ 
fect already partially accomplished by running the LOE and 
ASAP methods with multiple parameters to get a more holis¬ 
tic measurement of performance. It is worth mentioning that 
while LOE BEGS and LOE MM are iterative methods which 
should converge to the best estimate of the solution as the 
number of iterations increases, ASAP is not iterative and the 
results of ASAP LOE with a given MPS, do not inform the 
results of ASAP LOE with another MPS. This aspect, com¬ 
bined with the randomized k-means spectral clustering used 
to choose patches means that we do not generally expect the 
recovery errors of ASAP LOE to be monotonically decreasing 
with MPS or time (as higher MPS generally leads to longer 
computational time). A principled way of choosing the best 
MPS for a given application of ASAP LOE could be of further 
interest. 


Recovery Method 


LE 

Laplacian Eigenmaps embedding 

LOE MM 

Local Ordinal Embedding using ma- 


jorization minimization 

LOE BFGS 

Local Ordinal Embedding using BFGS 

ASAP LOE 

ASAP & LOE BFGS patch embeddings 

Parameters 


sparse k 

k = [2 log(n)] 

dense k 

k = |‘^nlog(n)] 

MPS 

maximum patch size (for ASAP) 

Iter. 

number of iterations (of LOE) 

Data sets 


PC 

2D piecewise constant half-planes 

PCS 

2D piecewise constant squares 

Gauss 

2D Gaussian 

halfcube 

3D piecewise constant half-cubes 

Donut 

3D Donut 

US cities 

2D coordinates of US cities 

NetSci2010 

co-authorship network of scientists 


Table 1. Notation for plotting experimental results. 


5.1. The need for scale synchronization 

Eirst, to illustrate the importance of the scale synchroniza¬ 
tion introduced in Section [T^ we compare in Eigurej^ASAP 
synchronized embeddings with and without this step. Clearly, 
this step significantly improves the recovered solutions. 


Ground Truth ASAP LOE MPS400 ASAP LOE no scale MPS4( 



Fig. 2. Left: Ground truth, n = 1000, k = 14. Middle: 
ASAP LOE with scale synchronization: £a = 0.007. Right: 
ASAP LOE without scale synchronization: £a = 0.038. 


5.2. Simulations with n = 500,1000, 5000 with sparse 
and dense k 

We show £a versus runtime for recovering n = {500,1000, 5000} 
points sampled from the PC (Eigurej^, PCS (Eigurej^, and 
Gaussian (Eigurej^, with each figure showing results in the 
sparse and dense k regime (see Table [^. We also show £a 
versus runtime for n = {500,1000, 5000} points drawn from 
the halfcube (Eigurej^ distribution for k = 50,150250,450. 
Even for lower values of n, we find that ASAP LOE is of¬ 
ten either faster than or better-performing than LOE BEGS, 
or both. This seems to be especially true in the sparse k 
domain. This is partly due to the massively parallel embed¬ 
ding step in ASAP, which can take advantage of multiple 
cores as the problem scales. One would expect that as n 
continues to grow, if more processors are made available and 
memory increases sufficiently, the advantage of embedding 
parallelization would continue to increase. 

To further illustrate how the methods perform, we plot the 
embeddings of n = 1000 point sampled from the 2D densi¬ 
ties in Eigure]^ In each case, the ASAP LOE with MPS=400 
takes less time to run and yields smaller £a errors than the 
LOE BEGS with 100 maximum iterations. We only run LOE 
MM for n = 500 because of difficulties we had when trying 
to get the provided R implementation to run on our Linux- 
based remote computing resource. We ran into no problems 
with the LOE BEGS implementation. The computers used 
have 12 CPU cores which are Intel(R) Xeon(R) X5650 @ 

2.67GHz, and have 48GB ram. The R implementation of 
LOE does not (as far as its authors are aware) take advan¬ 
tage of multiple cores, and runs a single process on a single 
core. In contrast, our ASAP Matlab implementation uses the 
Multicore package to divide up the local embedding problems 
among the available cores. 

To demonstrate that this approach is not limited to the 
2D case, nor does it only perform well on synthetic data, 
we plot in Eigure the embeddings Procrustes aligned with 
points sampled from a 3D donut shape, and actual coordi¬ 
nates of n = 1101 US cities. In both cases, ASAP LOE with 
MPS=300 runs faster and yields smaller £a than LOE BEGS 
with 500 maximum iterations, the latter of which produces 
twisted or folded results. 










PC 


PC 


PCS 


PCS 






Fig. 3. £a vs. time, n = {500,1000, 5000}, Left: k sparse. 
Right : k dense, piecewise constant half-planes, o ASAP 
LOE, X LOE BEGS, □ LE , ★ LOE MM 


MPS 

100 

300 

500 

?CS£a 
PCS A 

5.1 X 10-4 
5.8 X 10-4 

5.6 X 10-4 

4.7 X 10-4 

1.9 X 10-4 
3.0 X 10-4 


Table 2. Recovery results for n = 50,000 for ASAP LOE. 


5.3. Large n : 50,000 

In Table 1^ we show £a vs runtime for ASAP LOE on a data 
set of n = 50,000 points and k = 22. While this size 
is completely prohibitive for LOE BEGS, ASAP LOE pro¬ 
duces good results in less than 4 hours. The worst possible 
result would be all edges of original graph misplaced, mean¬ 
ing £A = 2’50k’ 22/{b0k)‘^ = 8.8 x 10“^. f a = 2 x 10“^ 
means we get approximately 3/4 of the edges correct. 


Fig. 4. £a vs. time, n = {500,1000, 5000}, Left: k sparse. 
Right : k dense, piecewise constant half-planes, o ASAP 
LOE, X LOE BEGS, □ LE , ★ LOE MM 


of k small relative to n and not too large relative to MPS, 
ASAP LOE BEGS returns sensible, although not perfect re¬ 
sults. As k gets too large however the results are quite poor. 
We suspect this is a result of k being too large relative to 
MPS, leading to patches which are overly dense. When an 
ordinal graph contains nearly all possible edges, it essentially 
provides no information. When such data is of specific in¬ 
terest, one could either increase the mps as computational re¬ 
sources and time allow, or potentially use an alternate method 
for breaking the graph into overlapping patches which are not 
too dense. 


5.4. Increasing k 


We show in Eigure 10 scaled (n/k) x NumberNonzero(A — 
Aq) (this is a rescaling of £a proportional to number of mis¬ 
placed edges, more comparable for different values of k) and 
procrustes error versus increasing values of k for n = {5000} 
points drawn from the piecewise constant half-planes distri¬ 
bution using the method ASAP LOE with MPS=300. We 
see that for large n, adjacency matrix error and Procrustes 
error remain relatively small and stable over a range of small 


increasing k. Additionally, we show in Eigure 16 some of 
the embeddings corresponding to these results. Like the Pro¬ 
crustes error plot, these embeddings suggest that for a range 


5.5. Density Estimation Experiments 


In Eigure [TT] we show the results of applying TV MPLE to 
some of the embeddings shown in Eigure [7] The regulariza¬ 
tion parameter used is .0001 . This is not obtained by cross- 
validation, but it simply seems to perform well on the origi¬ 
nally sampled points. The densities of the approximate em¬ 
beddings are as expected, with ASAP LOE BEGS recovering 
the density best, with LOE BEGS behind, and LE doing the 
worst. This altogether suggests that better embedding results 
do lead to better density estimation, if that is the end goal. 
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Fig. 5. £a vs. time, n = {500,1000, 5000}, Left: k sparse. 
Right : k dense, Gaussian density o ASAP LOE, X LOE 
BEGS, □ LE , ★ LOE MM 
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Fig. 11. TV MPLE applied to example embeddings of PC 
n = 1000, k dense, and top left : LE, top right : LOE 
BEGS maxlt=100, bottom left: ASAP LOE BEGS max patch 
size 400, bottom right : estimated density from ground truth 
points, see column 1 of Eigurej^ 
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Fig. 6. £a vs. time, n = {500,1000,5000}, k = 
50,150,250,450, 3D half-cube density o ASAP LOE, x 
LOE BEGS, □ LE , ★ LOE MM 
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Fig. 12. TV MPLE applied to example embeddings of PCS 
n = 1000, k dense, and top left : LE, top right : LOE 
BEGS maxlt=100, bottom left: ASAP LOE BEGS max patch 
size 400, bottom right : estimated density from ground truth 
points, see column 2 of Eigurej^ 


5.6. Network of network scientists embedding 

To further illustrate potential of ordinal embedding to broad 
categories of data, we present here an experiment embedding 
data that does not have an apparent ground-truth geometry. 
We use data from a co-authorship network of network sci¬ 
entists l44l from 2010, which was studied in ED to evalu¬ 
ate methods of computing core-periphery structure. The net¬ 
work contains nonegatively weighted undirected edges where 
the weights are based on the number of papers they have co¬ 
authored. The network has 552 nodes and 1318 edges, with 
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Fig. 7. Embeddings for the PC (left), PCS (middle), and 
Gauss (right) data sets with n = 1000, and k dense. Row 
1 : LE. Row 2: LOE BEGS Iter.=100. Row 3: ASAP LOE 
with MPS = 400 (with each ASAP result obtained is less time 
than the corresponding LOE result). Row 4: ground truth. 


the number of edges attached to each node ranging from 1 
to 38. The mean number of edges attached to each node is 
4.7754 and the median is 4. 

To embed the data, we treat the co-authorship links as 
nearest neighbor relationships. In other words, if X and Y 
have authored papers together, but X has not authored any 
papers with Z, we impose that the distance between X and Y 
should be smaller than the distance between X and Z. We used 
LOE BEGS and ASAP LOE BEGS to perform these embed¬ 
dings in 2D and 3D. In this case, the LOE results were ulti¬ 
mately best with the 2D LOE BEGS 500 iteration embedding 
misplacing 754 of the 1318 nearest neighbor edges and the 
3D LOE BEGS 500 Iteration embedding misplacing 272 of 
the nearest neighbor edges, while the 2D ASAP LOE BEGS 
mps 500 misplaced 910 edges, and the 3D ASAP LOE BEGS 
mps 500 misplaced 467 edges. That being said, several of the 
runtimes for the ASAP LOE results beat the LOE results. We 
speculate that the reasons LOE outperforms ASAP LOE in 
accuracy in this case are twofold : 1) the number of nodes. 


LOE BFGS 100 it LOE BFGS 100 it 

haifcube n=1000 k=50 haifcube n=1000 k=150 



Fig. 8. Embeddings for haifcube data sets with n = 1000, 
and k = bO (left), 150 (middle), 450 (right) Row 2: LOE 
BEGS Iter.=100. Row 3: ASAP LOE with MPS = 300 (with 
each ASAP result obtained is less time than the corresponding 
LOE result). Row 4: ground truth. 


n = 552, is too small to make the LOE method applied to the 
full data sufficiently intensive, and 2) the wide distribution of 
degrees of the nodes in the network perhaps does not go well 
with our approach of breaking up the network via spectral 
clustering. Perhaps other methods for braking up the network 
should be considered when the degree distribution is highly 
varied. 


Independent of the comparison of the two methods, we 
look at the best 2D and 3D embeddings from LOE (shown in 
Eigure and Eigure respectively), to see if the embed¬ 
dings preserve any interesting structure in the network. Since 
the network was previously studied for core-periphery detec¬ 
tion, we color the nodes based on the corescore computed by 
the method proposed in ||45]| (mapping low values to blue and 
high values to red), and label the names of the authors with 
the top 10 corescores. These red, core authors appear primar¬ 
ily central to the embeddings, suggesting that these embed¬ 
dings preserve important structural properties in the original 
network. 
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Fig. 14. LOE BFGS 3d embeddings of data from NetSci2010 
data set, n = 552, where co-authorship imposes that authors 
should be close 


6. A LINEAR PROGRAM ALTERNATIVE TO SDP 
EMBEDDING 


Fig. 9. Embeddings of Donut (3D) and US Cities (2D) data 
sets. Row 1: LOE BFGS Iter.=500. Row 2: ASAP LOE 
MPS=300 (with each ASAP result obtained in less time than 
the corresponding LOE result). Row 3: Ground truth. 

PC, n=5000 PC, n=5000 

ASAP LOE MPS 300 ASAP LOE MPS 300 




Fig. 10. ASAP LOE MPS=300, n = 5000, k increasing by 
20, Left: number of differences in adjacency matrix divided 
by number of edges, nk, Right: Procrustes error. 
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Fig. 13. LOE BFGS 2d embeddings of data from NetSci2010 
data set, n = 552, where co-authorship imposes that authors 
should be close 


In this section we present the algorithm and a few results for 
a Linear Program Embedding approach using metric MDS 
(LPEm) for ordinal embedding. Though the results are ul¬ 
timately not competitive with Local Ordinal Embedding, the 
approach is different enough so that the ideas may be of in¬ 
dependent interest. In contrast to the SDP methods which 
cast embedding problems in terms of the Gram matrix K our 
LPEm approach for kNN embedding optimizes over the vari¬ 
ables D (the distance matrix), R (the radius at each node), 
and the slack variables. The radius at each node i, denoted by 
Ri is defined to be the distance between node i and its k-th 
closest neighbor. Thus Ri is the radius of the neighborhood 
at node i. In kNN embedding, the objective and constraints 
can be written as linear constraints in D^R and the slack vari¬ 
ables, altogether leading to a linear program which is compu¬ 
tationally cheaper to solve than an SDP. Although SDP-based 
methods can encompass a larger class of problems, they cur¬ 
rently do not approach the scalability or numerical maturity 
of LP and SOCP solvers. 

After the LP returns a candidate distance matrix D and 
radii R, we pass D into a standard mdscale, here using metric 
multidimensional scaling (see Algorithm |^, where by T we 
mean the set of triangle inequalities we considered (ordered 
set {i^j^k)). If {i^j^k) G T, the same holds true for the 
two other permutations. The full set of triangle inequalities 
are necessary, though not sufficient, for the matrix D to cor¬ 
respond to an Euclidean distance matrix. If one omits slack 
variables, there are n{n — l)/2 distance values to solve for 
along with n radii, and thus n(n + l)/2 unknowns in total. 
Considering the ordinal constraints, for the upper bounds on 
the entries Dij, there are n ways to choose i, and for each 
i there are k ways to choose j, thus nk/2 constraints (ac¬ 
counting for symmetric distances). For the lower bounds on 







Algorithm 3 LP approach 


=argmin 

a,l3,R,D 


subject to 


X =mds {D*,d) 


iie-E(G) ij^EiO 

a,pG RgRI,Dg 

Dij < Ri + ctij, if ij € E{G) 
Dij > Ri- Pij, if ij ^ E{G) 

n 

J2Ri = V 

i=l 

Dij H- Di]^ < Dj^i^ k) ^ D 


the entries Dij there are n ways to choose i and for each i 
there are n — /c — 1 ways to choose j, giving n{n — k — 1)/2 
constraints. So there are n{n — l)/2 ordinal constraints on 
relating the n(n — l)/2 distances and n radii. In other words, 
the intuition behind the added triangle inequalities is that they 
help to better constrain the system. There are on the order of 

triangle inequalities (choose any three points), so for large 
n, there are many more constraints than unknowns. 

To avoid the added complexity from imposing all triangle 
inequalities, one could consider models that impose only a 
fraction of such constraints via either imposing them locally, 
for k-hop neighboring triples of points, or globally, such as 
picking edges via an Erdos-Renyi model, or mixing the two 
approaches. 

We remark that dropping triangle inequalities altogether 
could certainly speed up the embedding process. The result¬ 
ing non-metric D may correspond to an increasing function 
of distance (e.g., distance squared), which suggests that non¬ 
metric MDS would be appropriate. 


In general, even if the recovered distance metric corre¬ 
sponds to a metric distance, this is not a guarantee that the dis¬ 
tance is realizable in a low-dimensional space. That requires 
a rank constraint on D, which is non-convex and is computa¬ 
tionally intractable for an LP or SDR The ultimate embedding 
into a low-dimensional space thus potentially gives up some 
structure in both the LP and SDP formulation, and it can be 
argued that this effect is lessened via the local to global ap¬ 
proach. 


In Ligure 15 we show an example with points drawn 
from the densities discussed in the previous section along 
with points embedded using the LPEm approach. In these 
experiments we use a very dense value of k, k = n/2 = bO, 
which is where the approach seemed to work the best. The 
recovery of the piecewise constant half-planes is the best, 
but the preliminary results led us to decide not to experiment 
with this method further for the time being. The method was 
implemented using the C VX library, a package for specifying 


and solving convex programs ((461 BZl). Overall, we find 
the LP formulation appealing due to its simplicity. It would 
be interesting if a similarly simple approach could obtain 
competitive results on the problem of ordinal embedding, 
especially since until the work of von Luxburg and Alamgir 
||2), it was unknown to the community whether the problem 
was practically solvable at all. 


LPEm 

PC n=100 k=22 



LPEm 

PC n=100 k=50 



LPEm 

Gauss n=100 k=22 



LPEm 

Gauss n=100 k=50 



Ground Truth 
Gauss n=100 

0 0 


Ground Truth 
PCSn=100 



0 


O^O^O 


0 Oo oW 0^ 








Oa 


00 


0 

0 


Fig. 15. Linear Program Embeddings for the PC (left), PCS 
(middle), and Gauss (right) data sets with n = 100, Row 1 : 
k = 22 Row 2: k = 50 Row 3: ground truth. Line segments 
highlight the displacement of each point. 


7. SUMMARY AND DISCUSSION 

We have demonstrated that the computational efficiency of 
LOE for the kNN embedding problem can be significantly 
improved, while maintaining and often improving spatial and 
ordinal accuracy in a distributed setting. Our application of 
the divide-and-conquer ASAP method renders the problem of 
kNN embedding significantly more tractable, distributing the 
embedding steps, and using fast spectral methods to combine 
them. We expect that such improvements will make it possi¬ 
ble to use kNN embeddings in a broader range of settings, and 
that the ASAP framework will be of independent interest to 
the machine learning community for tackling large geometric 
embedding problems. 
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7.1. Rigidity Theory Appendix 

One of the main questions in the field of rigidity theory asks 
whether one can uniquely determine (up to rigid transforma¬ 
tions, such as translations, rotations, reflections) the coordi¬ 
nates of a set of points ,..., Pn given a partial set of dis¬ 
tances dij = \ \pi— Pj 11 between n points in To make our 
paper self-contained, this short appendix if a very brief sum¬ 
mary of the main definitions and results related to local and 
global rigidity from the literature (e.g., UHl |49l |50l [STJ and 
references therein]). Readers who are unfamiliar with rigidity 
theory may use this short Appendix as a glossary. As previ¬ 
ously discussed in Section [3T| one of the steps of the divide- 
and-conquer approach proposed for the kNN-recovery prob¬ 
lem relies to testing whether the underlying resulting patches 
are globally rigid. As observed in our numerical simulations 
detailed in Eigures [S)?) [5) [7] the final reconstruction is more 

accurate when we rely on global rigidity as a postprocessing 
step for the partitions obtained via spectral clustering. The 
intuition behind our approach is as follows. In the case when 
distance information is available, testing for global rigidity is 
a crucial step in making sure that each of the local patches has 
a unique embedding in its own reference frame, approxima- 
tively consistent with the ground truth, up to a rigid transfor¬ 
mation. Since in the kNN-recovery problem, we do not have 
distance information but only ordinal data, thus we are faced 
with solving even a harder problem, we expect that the global 
rigidity check will improve the accuracy of the local patch 
embeddings. One specific example where our current rigid¬ 
ity heuristics improved results was in performing ASAP LOE 
BEGS with max patch size 300, on n = 5000 points drawn 
from the constant half-plane distribution, letting k = 18. In 
that example, performing the rigidity check and pruning gave 
a runtime of 107.056 s, an ordinal error of 0.00107096, and 
0.0585465 Procrustes error, while skipping the rigidity check 
and pruning gave a runtime of 192.606 s, an ordinal error of 
0.00154208 A error, and 0.175992 Procrustes error. 

A bar and joint framework in is defined as an undi¬ 
rected graph G = {V,E) {\V\ = n,\E\ = m) together with a 
configuration p which assigns a point Pi in to each vertex 
i of the graph. The edges of the graph correspond to distance 


constraints, that is, (i,jf) G if an only there is a bar of 
length dij between points pi and pj . We say that a framework 
G{p) is locally rigid if there exists a neighborhood U of G{p) 
such that G{p) is the only framework in U with the same set 
of edge lengths, up to rigid transformations. In other words, 
there is no continuous deformation that preserves the given 
edge lengths. A configuration is generic if the coordinates 
do not satisfy any non-zero polynomial equation with integer 
coefficients (or equivalently algebraic coefficients). 

Local rigidity in has been shown to be a generic prop¬ 
erty, in the sense that either all generic frameworks of the 
graph G are locally rigid, or none of them are. A conse¬ 
quence of the seminal results of Gluck ll52l and Asimow and 
Roth 15^ asserts that the dimension of the null space of the 
rigidity matrix is the same at every generic point, and hence 
local rigidity in is a generic property, meaning that either 
all generic frameworks of the graph G are locally rigid, or 
none of them are. With probability one, the rank of the rigid¬ 
ity matrix that corresponds to the unknown true displacement 
vectors equals the rank of the randomized rigidity matrix. A 
similar randomized algorithm for generic local rigidity was 
described in 1491 Algorithm 3.2]. In other words, generic lo¬ 
cal rigidity in can be considered a combinatorial property 
of the graph G itself, independent of the particular realization. 
Using this observation, generic local rigidity can therefore be 
tested efficiently in any dimension using a randomized algo¬ 
rithm l5Qll : one can just randomize the displacement vectors 
Pi,..., Pn while ignoring the prescribed distance constraints 
that they have to satisfy, construct the so called rigidity matrix 
corresponding to the framework of the original graph with the 
randomized points and check its rank. This is approach we 
use to make sure the obtained patches are local rigid. 

Since local generic rigidity does not imply unique real¬ 
ization of the framework, it is possible that there exist multi¬ 
ple non-congruent realizations that satisfy the prescribed dis¬ 
tances (which we do not even have available in the kNN re¬ 
covery problem) One may consider for example, the 2U-rigid 
graph with n = 4 vertices and m = 5 edges consisting of 
two triangles that can be folded with respect to their joint 
edge. We call a framework G{p) globally rigid in if all 
frameworks G{q) in which are G(p)-equivalent (have all 
bars the same length as G{p)) are congruent to G{p) (i.e., re¬ 
lated by a rigid transformation). Hendrickson proved two key 
necessary conditions for global rigidity of a framework at a 
generic configuration: 

Theorem 1 (Hendrickson 1501 ). If a framework G{p), other 
than a simplex, is globally rigid for a generic configuration p 
in then: 

• The graph G is vertex {d - 1 - 1)-connected; 

• The framework G{p) is edge-2-rigid (or, redundantly 
rigid), in the sense that removing any one edge leaves 
a graph which is infinitesimally rigid. 


We say that a graph G is generically globally rigid in if 
G{p) is globally rigid at all generic configurations p 15411551 . 
Though it has been conjectured for many years that global 
rigidity is a generic property, this fact was shown to be true 
only very recently. The seminal work of 1551 [49l proves that 
global rigidity is a generic property of the graph in each di¬ 
mension. The conditions of Hendrickson as stated in Theo¬ 
rem!^ are necessary for generic global rigidity. They are also 
sufficient on the line, and in the plane l56l . However, by a 
result of Connelly l54l . in 3-space is generically edge- 
2-rigid and 5-connected but is not generically globally rigid. 

One of the tools used in testing for global rigidity of 
frameworks relies on the notions on stress matrices, more 
popular perhaps in the engineering community. A stress is 
defined an assignment of scalars Wij to the edges of the given 
graph G such that for every node i e V it holds that 



( 8 ) 


j: {i,j)eE 


Alternatively, it can be show that a stress is a vector w in the 
left null space of the rigidity matrix: Rg{p)^w = 0. A stress 
vector can be rearranged into an n x n symmetric matrix Q, 
known as the stress matrix, such that for i ^ j, the (i,j) 
entry of ft is ftij = —uoij, and the diagonal entries for (i, i) 
are flu = ^j .Since all row and column sums are 
zero, it follows that the all-ones vector (11 • • • 1)^ is in 
the null space of ft as well as each of the coordinate vectors 
of the configuration p. Therefore, it follows that for generic 
configurations the rank of the stress matrix is at most n — 
(d + 1). The following pairs of theorems give sufficient and 
necessary conditions for generic global rigidity: 

Theorem 2 (Connelly EH). Ifp is a generic configuration in 

such that there is a stress, where the rank of the associated 
stress matrix ft is n — (d + 1), then G{p) is globally rigid in 

Theorem 3 (Gortler, Healy, and Thurston 14^ ). Suppose that 
p is a generic configuration in such that G{p) is globally 
rigid in Then either G{p) is a simplex or there is a stress 
where the rank of the associated stress matrix ftis n—{d^l). 

Based on the latter theorem, the authors of 14^ also 
provided a randomized polynomial algorithm for checking 
generic global rigidity of a graph (42 Algorithm 3.3], which 
we use to test for global rigidity of the patches in the kNN- 
recovery problem. If a given patch is generically locally rigid 
then their algorithm picks a random stress vector of the left 
null space of the rigidity matrix associated to this patch, and 
converts it into a stress matrix. If the rank of the stress matrix 
is exactly n — (d -f 1), then we conclude that the patch is 
globally rigid, and if the rank is lower, then the respective 
patch is not globally rigid. 
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Fig. 16. ASAP LOE BEGS MPS=300, n = 5000, k increas¬ 
ing by 20, Top left : originally sampled points. Remaining 
plots : recovered embeddings 
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